Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming
نویسندگان
چکیده
Speaker segmentation is an important task in multi-party conversations. Overlapping speech poses a serious problem in segmenting audio into speaker turns. We propose an audio-visual speech separation system consisting of an array microphone with eight sensors and an omnidirectional color camera. Multiple concurrent speeches are segmented by fusing the two heterogeneous sensors. Each segmented speech is further enhanced by a linearly constrained minimum variance beamformer. Regardless of co-existing wide-band sound sources and pictures of human in a reverberant environment the proposed system effectively separates multiple target speeches.
منابع مشابه
Real-Time Speaker Localization and Speech Separation by Audio-Visual Integration
| Robot audition in real-world should cope with motor and other noises caused by the robot's own movements in addition to environmental noises and reverberation. This paper reports how auditory processing is improved by audio-visual integration with active movements. The key idea resides in hierarchical integration of auditory and visual streams to disambiguate auditory or visual processing. Th...
متن کاملImprovement of three simultaneous speech recognition by using AV integration and scattering theory for humanoid
This paper presents improvement of recognition of three simultaneous speeches for a humanoid robot with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable d...
متن کاملUsing audio and visual information for single channel speaker separation
This work proposes a method to exploit both audio and visual speech information to extract a target speaker from a mixture of competing speakers. The work begins by taking an effective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The audio input is taken from a single chann...
متن کاملRobust audio-visual speech synchrony detection by generalized bimodal linear prediction
We study the problem of detecting audio-visual synchrony in video segments containing a speaker in frontal head pose. The problem holds a number of important applications, for example speech source localization, speech activity detection, speaker diarization, speech source separation, and biometric spoofing detection. In particular, we build on earlier work, extending our previously proposed ti...
متن کاملA crack localization method for beams via an efficient static data based indicator
In this paper, a crack localization method for Euler-Bernoulli beams via an efficient static data based indicator is proposed. The crack in beams is simulated here using a triangular variation in the stiffness. Static responses of a beam are obtained by the finite element modeling. In order to reduce the computational cost of damage detection method, the beam deflection is fitted through a poly...
متن کامل